High-performance Monitoring Architecture for Large-scale Distributed Systems Using Event Filtering
نویسندگان
چکیده
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. In this paper, we present a scalable high-performance monitoring architecture for LSD systems using an efficient event filtering mechanism to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding endpoints management applications (such as debugging and reactive control tools). Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its instrumentation and subscription components. The intrusiveness of the monitoring process is minimized by reducing the event traffic and distributing the monitoring load. In this paper, we describe and motivate the monitoring approach and the components design of the monitoring system that we are developing to observe the run-time behavior of LSD systems and improve their reliability and performance.
منابع مشابه
High-performance Event Filtering for Distributed Dynamic Multi-point Applications: Survey and Evaluation
High-performance event filtering is an essential service in a distributed systems environment. We are developing an event filtering architecture to efficiently process the large volume of event traffic generated by distributed dynamic multi-point (DDMP) applications (such as automated monitoring and fault management in distributed systems). Our architecture supports the dynamic (re)configuratio...
متن کاملA Scalable Monitoring Architecture for Managing Large-scale Distributed Multimedia Systems1
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. In an LDM environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LDM ...
متن کاملHiFi: A New Monitoring Architecture for Distributed Systems Management
With the increasing complexity of large-scale distributed (LSD) systems, an efficient monitoring mechanism has become an essential service for improving the performance and reliability of such complex applications. This paper presents a scalable, dynamic, flexible and non-intrusive monitoring architecture for managing large-scale distributed (LSD) systems. This architecture, which is is referre...
متن کاملHierarchical Filtering-based Monitoring System for Large-scale Distributed Applications
On-line monitoring of large-scale distributed (LSD) applications is an eeective means to observe the appli-cations' behavior at run-time and provide status information required by debugging and management tools. In this paper, we describe and motivate the architecture and the components design of a scalable, high-performance, dynamic and non-intrusive monitoring system for LSD applications. The...
متن کاملA scalable monitoring architecture for managing distributed multimedia systems
Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed multimedia (LDM) systems. Monitoring events generated by LDM systems is necessary for observing the run-time behavior of LDM systems and providing status information required for managing such applications. However, correlated events are generated concurrently and could be di...
متن کامل